Skip to content
This repository has been archived by the owner on Nov 25, 2024. It is now read-only.

State storage refactor #1839

Merged
merged 18 commits into from
Apr 26, 2021
Merged

State storage refactor #1839

merged 18 commits into from
Apr 26, 2021

Conversation

neilalexander
Copy link
Contributor

@neilalexander neilalexander commented Apr 20, 2021

This PR refactors the state snapshot and state block storage.

Snapshots and blocks will now be deduplicated using a hash key and stored in a single array column rather than multiple rows, which should reduce overheads significantly when pulling state information out of the database and will reduce the amount of disk space used by state storage dramatically.

There is an included migration which will upgrade the state storage - this can take hours on a large database. Most importantly, there is no reverse migration so this is one-way for now.

@kegsay kegsay self-requested a review April 20, 2021 15:15
@neilalexander neilalexander marked this pull request as ready for review April 26, 2021 08:38
Copy link
Member

@kegsay kegsay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, some questions and clarifications would be good in the code as it's rather thin on comments.

roomserver/types/types.go Show resolved Hide resolved
// nolint: gocyclo
func (d *Database) create(db *sql.DB) error {
if err := createEventStateKeysTable(db); err != nil {
return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could probably do some shenanigans like we do for StatementList to avoid the constant if statements but this works too.

@@ -24,6 +24,7 @@ import (

func LoadFromGoose() {
goose.AddMigration(UpAddForgottenColumn, DownAddForgottenColumn)
goose.AddMigration(UpStateBlocksRefactor, DownStateBlocksRefactor)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question and I am now not sure I understand what the LoadFromGoose function is for — it's a delta package-wide exported function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's two ways to execute migrations, one via goose CLI and one via Dendrite. This is for the former. It feels weird that this function is living in this file. We do need a function like this but it should probably live elsewhere.

roomserver/storage/postgres/events_table.go Show resolved Hide resolved
roomserver/storage/sqlite3/events_table.go Show resolved Hide resolved
roomserver/storage/shared/storage.go Outdated Show resolved Hide resolved
_roomserver_state_snapshots.state_snapshot_nid
FROM
_roomserver_state_snapshots
LIMIT $1 OFFSET $2)) AS _roomserver_state_block
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we use an ORDER BY here to ensure we are getting the same ordering each time we do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it really matters. We end up sorting the NIDs in Go anyway, and since the migration transaction only gets committed if the entire thing succeeds, we don't really need to care about the ordering that the migration runs in.

@neilalexander neilalexander merged commit 5ce1fe8 into master Apr 26, 2021
@neilalexander neilalexander deleted the neilalexander/staterefactor branch April 26, 2021 12:25
@neilalexander neilalexander added the changes database schema This PR changes the database schema and might be breaking label Apr 26, 2021
danpe pushed a commit to globekeeper/dendrite that referenced this pull request May 2, 2021
* Hash-deduplicated state storage (and migrations) for PostgreSQL and SQLite

* Refactor droomserver database setup for migrations

* Fix conflict statements

* Update migration names

* Set a boundary for old to new block/snapshot IDs so we don't rewrite them more than once accidentally

* Create sequence if not exists

* Fix boundary queries

* Fix boundary queries

* Use Query

* Break out queries a bit

* More sequence tweaks

* Query parameters are not playing the game

* Injection escaping may not work for CREATE SEQUENCE after all

* Fix snapshot sequence name

* Use boundaried IDs in SQLite too

* Use IFNULL for SQLite

* Use COALESCE in PostgreSQL

* Review comments @kegsay
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C-Roomserver changes database schema This PR changes the database schema and might be breaking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants